Skip to content

[Performance] Add --enable-ep-weight-filter CLI option#37351

Merged
esmeetu merged 1 commit into
mainfrom
opt-ep-weights-filter
Mar 18, 2026
Merged

[Performance] Add --enable-ep-weight-filter CLI option#37351
esmeetu merged 1 commit into
mainfrom
opt-ep-weights-filter

Conversation

@esmeetu
Copy link
Copy Markdown
Member

@esmeetu esmeetu commented Mar 17, 2026

Summary

Usage

vllm serve model \
  --enable-expert-parallel \
  --enable-ep-weight-filter

Without --enable-ep-weight-filter, loading behavior is identical to main.

Test plan

  • vllm serve without --enable-ep-weight-filter — no behavior change
  • vllm serve --enable-expert-parallel --enable-ep-weight-filter on per-expert MoE — correct loading, reduced I/O
  • Non-MoE model with flag — no effect
  • 3D fused-expert model with flag — no effect (filter returns None)

🤖 Generated with Claude Code

Add opt-in flag to skip non-local expert weights during model loading
when expert parallelism is active. Each rank only reads its own expert
shard from disk, reducing storage I/O for MoE models with per-expert
weight tensors.

Signed-off-by: esmeetu <esmeetu@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: esmeetu <jasonailu87@gmail.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an opt-in command-line flag --enable-ep-weight-filter to optimize model loading for Mixture-of-Experts models with expert parallelism. The changes correctly add the new configuration option and integrate it into the model loading logic. My main feedback is to add a validation check to ensure enable_expert_parallel is active when enable_ep_weight_filter is used, to prevent silent failures from misconfiguration and improve user experience.

Comment thread vllm/config/parallel.py
"""Whether the deployed model is MoE (if known)."""
enable_expert_parallel: bool = False
"""Use expert parallelism instead of tensor parallelism for MoE layers."""
enable_ep_weight_filter: bool = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To improve robustness and prevent user confusion from misconfiguration, it's a good practice to validate that enable_expert_parallel is enabled when enable_ep_weight_filter is used. Currently, if a user enables enable_ep_weight_filter without enable_expert_parallel, it will fail silently.

Consider adding a validation check in the _validate_parallel_config method of this class, similar to how enable_eplb is validated. This would raise an error for invalid combinations.

Example:

if self.enable_ep_weight_filter and not self.enable_expert_parallel:
    raise ValueError(
        "enable_expert_parallel must be True to use enable_ep_weight_filter."
    )

@esmeetu esmeetu added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 17, 2026
@khluu khluu added this to the v0.18.0 cherry picks milestone Mar 18, 2026
@esmeetu esmeetu merged commit 761e0aa into main Mar 18, 2026
69 checks passed
@esmeetu esmeetu deleted the opt-ep-weights-filter branch March 18, 2026 01:36
khluu pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
(cherry picked from commit 761e0aa)
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
…37351)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
maoxx241 pushed a commit to maoxx241/vllm that referenced this pull request Mar 24, 2026
…37351)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
(cherry picked from commit 761e0aa)
SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026
…37351)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
…37351)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026
…37351)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026
…37351)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026
…37351)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…37351)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…37351)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
(cherry picked from commit 5eb1ef3)
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…37351)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…37351)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
(cherry picked from commit 4011dab)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants